Speaker normalization for automatic speech recognition - An on-line approach
نویسندگان
چکیده
We propose a method to transform the on line speech signal so as to comply with the specications of an HMM-based automatic speech recognizer. The spectrum of the input signal undergoes a vocal tract length (VTL) normalization based on dierences of the average third formant F3. The high frequency gap which is generated after scaling is estimated by means of an extrapolation scheme. Mel scale cepstral coecients (MFCC) are used along with delta and delta-cepstra as well as delta and delta energy. The method has been tested on the TI digits database which contains adult and kids speech providing substantial gains with respect to non normalized speech.
منابع مشابه
تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملTowards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that i...
متن کاملSpeaker Normalization for Improved Automatic Speech Recognition for Digital Libraries
SPEAKER NORMALIZATION FOR IMPROVED AUTOMATIC SPEECH RECOGNITION FOR DIGITAL LIBRARIES Wei Wang Old Dominion University, 2004 Director: Dr. Stephen A. Zahorian The context of the thesis work is the improvement of automatic speech recognition (ASR) for use with digital libraries. First, commonly used multimedia file formats and codecs are surveyed with the objective of identifying those formats t...
متن کاملSpeaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training
Silent speech recognition (SSR) converts non-audio information (e.g., articulatory information) to speech. SSR has potential to enable laryngectomees to produce synthesized speech with a natural sounding voice. Despite its recent advances, current SSR research has largely relied on speaker-dependent recognition. High degree of variation in articulatory patterns across different talkers has been...
متن کامل